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The Sharma-Mittal entropy H a ^(p) [U [2] of a probability densitytj] p is defined 



as 



1-/3 
I— a 



H a ,p{p) = \[J P{xY^y " -lj , with a >0, a ^1,0^1. (1) 

This bi-parametric family of entropies tends in limit cases to Renyi entropies 
Raip) = iz-log/ p(x) a dx (for /? — > 1), Tsallis entropies = jz^il p{ x ) a dx — 1) 

(for (3 — > a), and Shannon entropy H{p) = — f p(x) \ogp(x)dx (for both a, (3 — > 1). 
Sharma-Mittal entropy has previously been studied in the context of multidimensional 
harmonic oscillator systems [3]. 

Many usual statistical distributions including the Gaussians and discrete 
multinomials (that is, normalized histograms) belong to the exponential families pi]. 
Those exponential families play a major role in the field of thermo-statistics |5J, and 
admit the generic canonical decomposition 

p F {x\e) = exp ((9, t(x)) - F(9) + k(x)) , (2) 

where (•, •) denote the inner product, F is a strictly convex C°° function characterizing 
the family (called the log-normalizer since F{9) = log/ e^' t ^ x '' +k ^ x 'dx), 9 E is the 
natural parameter denoting the member of the family Ep = {pf{x\6) \ 9 G 0}, t(x) 
is the sufficient statistics, and k(x) is an auxiliary carrier measure jl]. The natural 
parameter space = {9 \ pp(x; 9) < oo} is an open convex set. 

For example, the probability density of a multivariate Gaussian p ~ iV(//, E) 
centered at with positive-definite covariance matrix £ is conventionally written as 

/ i v n 1 (x - /x) T S~ 1 (x - /x) 
p{x\fi, S) = - — exp , (3) 

(27T 



where |E| > denote the determinant of the positive definite matrix. Rewriting the 
density of Eq. [3] to fit the canonical decomposition of Eq. [21 we get 

p(x\/i, S) = exp (--aPE^x + x T S^V - ^ T S~V - \ log(2^) d |S|^) .(4) 



Using the matrix trace cyclic property, we have — \x t Tj 1 x = tr(— \x T E 1 
ti(xx T x (— iS -1 )), where tr denote the matrix trace operator. It follows that 



x 



p{x\fi, S) = exp ^(x, xx T ), [E- V, "2 s " 1 ) / " F ^J ' ( 5 ) 
= p(x\9). (6) 

with 9 = (E-V,-!^- 1 ) and F(9) = | log(27r) d | S| + |/i T £-V (and k(x) = 0). In 
this decomposition, the natural parameter 9 = (£ _1 /i, — |E _1 ) = (v, M) consists of two 

| For sake of simplicity and without loss of generality, we consider the probability density function p 
of a continuous random variable X ~ p in this note. For multivariate densities p, the integral notation 
J denote the corresponding multi-dimensional integral, so that we write for short f p(x)dx = 1. Our 
results hold for probability mass functions, and probability measures in general. 
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parts: a vectorial part v, and a symmetric negative definite matrix part M -< 0. The 
inner product of 9 = (v, M) and 9' = (V, M') is defined as (9, 9') = v T v' + ti(M T M'). 
For univariate normal distributions, the natural parameter 9 is (-^, — t^)- The order 
of the exponential family is the dimension of its natural parameter space 0. Normal d- 
dimensional distributions N(ji, E) form an exponential family of order d+ ' 1/1 "" r '" 
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We have M = -^S" 1 , that is | SH" 1 1 = 1 | 1 = | - 2M|, and ^ T = -^M" 1 (since 
M~ l = -2E, -|M^« = J]v = fi and M" T = M' 1 ). It follows that the log-normalizer 
F expressed using the canonical natural parameters is 

F0u,E) =ilog(2^|S| + i/i T S-V, (7) 
F(v, M) = - log 2tt - i log | - 2M| - ^v T M~ 1 v. (8) 
In order to calculate the Sharma-Mittal entropy of Eq. CD, let M a (p) = f p(x) a dx so 

that 

h«Ap) = (mm& - 1) . (9) 

Let us prove that for an arbitrary exponential family Sp — {pp(x\9) \ 9 G 0} 

M«(p) = e ^(^)-^W^[ e (--i)^W] (10) 

Proof: 

e {t(x),ae)-aF(e)+ak(x)+(l-a)k(x)~(l-a)k(x)+F(a8)-F(a8) ^ ^) 

e F(aff)-aF{e) pF ( x . a 0) e («-i)*(*)dar, (13) 
= e F(a6)-cF(0) J pF ( x;a 9)e {a - mx) dx, (14) 

= e F(«fl)-«F(fl)^ [e (a-l)fc(*)]. (15) 

Observe that in Eq. [TBI we require a# G for a valid exponential family distribution. 
This is the case whenever the natural parameter space is a convex cone (e.g., Gaussian 
case). It follows from Eq. [10] that the Sharma-Mittal entropy of a distribution p ~ Ep 
belonging to an exponential family Ep is 

H a , p {p) = -L^ ( e feK^-^W)(^[ e («-D*W])feS - i) . (16) 

In particular, when the auxiliary carrier measure k(x) = [4j (including the above- 
mentioned multivariate Gaussian family), Eq. [TH] becomes a closed- form formula since 
£ p [ e («-i)k(*)] = e p [1] = 1: 

H a M = ^ (e^ ^ - l) , (17) 
= 1 ^M-«F(«)^ _ ^. (1 8) 
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Figure 1. Plot of the Sharma-Mittal entropy (Eq. [22]) for the 4x4 covariance matrix 
£ = 4/ (independent of the mean /i), where / denotes the identity matrix. 



We derive in limit cases expressions for the Renyi, Tsallis and Shannon entropies 
of an arbitrary exponential family (with k(x) = 0): 

1 



R«(p) 

T a (p) 
H( P ) 



lim H a> p(p) 
lim H a>p {p) 



1 — a 
1 

1 — a 



(F(a6)-aF(9)), 

( p F{ae)- a F{d) _ ^ 



lim H a>f) (p) = F(O)-(0,VF(O)). 

p,a— >1 



(19) 

(20) 
(21) 



Note that the Shannon entropy of a member of an exponential family p ~ £p 
indexed with natural parameter 6 can also be rewritten as H(p) = H{9) = —F*{rj) with 
rj = VF(8) the dual moment coordinates, and F* the Legendre C°° convex conjugate 
of F [6]. 

Let us instantiate the generic formula of Eq.[T7]to the case of multivariate Gaussians 
with mean parameter p and covariance matrix E. We get 



Ha^il'', E) 



1 



'(27r)f|E|§ 



1-/3 



1-/3 



d(l-/3) 



- 1 



(22) 



independent of p (in ID, E = a 2 so that |E|^ = a). Indeed, consider the expression 
F(a8) — aF(8) in Eq. [THlfor the Gaussian log-normalizer of Eq. [SJ Using the fact that 
\ctA\ = a d \A\ for d- dimensional matrices A, the term F(a9) is 



F(a6) = F(av,otM), 



^log2^- ^loga d | -2M| - ~(cro) T (aM) -1 (cw;), 



(23) 
(24) 
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= - log 2tv - ^ log a - - log | - 2M\ - %i T M~ x v. (25) 
Similarly, we have 

aF(9) = aF(v, M) = ^ log 2tt - | log | - 2M| - ^v T M~ 1 v. (26) 
Thus by subtracting Eq. [26] to Eq. [251 we obtain 

F(a0) - aF{6) = d ^ ~ ^ \ og 27r _ t \ og a _ Izi? l og \-2M\. (27) 

Therefore, we deduce that 



(2?r 



l-a 



F(a8) - aF(9) = log v \ \ - 2M\~~ , (28) 



«2 



log[ (27r) ' d lSI^I, (29) 



l-a 



«2 



hence the result of Eq. [221 For ID Gaussians with standard deviation a > 0, this yields 

1-0 



'27TO- 

a) = — - I 1|. (30) 

J- - P \ Q,2(T^) 

Note that the differential Sharma-Mittal entropy of Gaussians may potentially be 
negative. 

Figure [H displays the plot of the Sharma-Mittal entropy for a 4 x 4 covariance 
matrix set to 4/, where I denotes the identity matrix. 

We also report respectively the Renyi, Tsallis and Shannon entropies for 
multivariate Gaussians 



(2tt)5[E|5 

Ra(lJL, S) = log — g (31) 

Q, 2(1=^ 

/ / , _ , d i \ l-a 



T Q (/i, E) 



1 — a 



(2 



7T 2 



Q2 



(32) 



#(//,£) = log^/(2^e) d |S| = -(d + (ilog27r + log|S|). (33) 

Figure [2] displays the plots of the Renyi (Eq. |3T]) and Tsallis (Eq. 132]) entropies for 
a d x <i-dimensional covariance matrix £ = a 2 / = 4/ (a = 2). 

Information geometry [6] considers the underlying differential geometry induced 
by a divergence. From the Sharma-Mittal entropy, we can derive the Sharma-Mittal 
divergence [7] between two distributions P ~ p and Q ~ q 

D a ,p(P ■■ Q) = j^j ((^Jp(x) Q q(x) 1 - a dxy~ a -lj ,Va>0,a^l,/^ 1.(34) 
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Note that D a ^(p : q) = if and only if p = q, since in that case / p(x) a q(x) 1 ~ a dx = 
f p(x)dx = 1. 

For a, [3 — > 1, the divergence tends to the renown Kullback-Leibler divergence. 
Let C a {p : q) = J p(x) a q(x) 1 ~ a dx denote the a-divergence [6], related to the Hellinger 
integral of order a: H a (p, q) = 1 — C a (p, q). For a = |, the similarity measure G\ (p : q) 
is symmetric and called the Bhattacharrya coefficient. The Bhattacharrya coefficient is 
related to the following squared Hellinger distance: 

H\p ■Q) = \j (VS^) " Jq(xj) 2 dx = 1 - Ci(p : q). (35) 
We rewrite compactly the Sharma-Mittal divergence of Eq. [34] as 

D a AP ■ q) = - 1) • (36) 

Let us prove that for members p(x) = pp(x\9) and q = pf{x\8') belonging to the 
same exponential family £ Fl we have C a (p : q) = e~ jF ' a ^ e ' £ '\ where 

J Fa (0 : 0') = aF(0) + (1 - a)F(9') - F(a6 + (1 - a)6') (37) 

is a Jensen difference divergence [8]. 
Proof: 

C a (p:q) = j p F {x\6) a p F (x\e') l - a dx, (38) 
C a (6 :6')= J exp «m*),e)-F(e) + k( x )) 

_ j e {t{x), a e+(i- a )e') 

xexp-^-^^'H^dx, (40) 
= J e F(ae + (i-*W')-«F(e)-(i- a )F(e') pF ( x . aQ + (1 _ a) ^ da . j (41) 
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= e - jF > a{e:e,) J PF {x; a0 + {l- a)9')dx, (42) 

= e - jF ^ 9:d,) > 0. (43) 

Observe that for a e (0, 1), a9 + (1 — a) 9' G since is an open convex set, and 
therefore the distribution pf(x; ol9 + (1 — a)9') is well-defined in Eq. SU 

It follows that the Sharma-Mittal divergence of distributions belonging to the same 
exponential family (even when k(x) ^ 0) is the following closed-form formula 

D a Ap ■■ q) = jzj '■ d ^ ~ x ) > ( 44 ) 

= - — - ^^ J F^v--e q ) _ i^j (45) 

For multivariate Gaussians, let us explicit the Jensen difference divergence Jp, a as 
the difference of two terms using the (/i, S) coordinate system 



aF(9) + (1 - a)F{9') = - log27r + - log 



+ -fi J E — fi S /i (46) 

and F(a0+ (1 - a)0')- Let 

a = a0+(l- a )0' = (aE-V+Cl-^E^V^Is- 1 -^-^^ -1 ) = (v a ,M a ). 

Denote by S a = — \M~ l and /2 a = -u a £ a the corresponding parameters. Using Eq. [TJ 
we have 

F(a0 + (1 - a)9') = F(fi a , E a ) = ^ log(2^) d |£ a | + ^E" 1 ^. (48) 

It follows that the Jensen difference divergence between two Gaussian distributions 
p ~ N(fi, S) and g ~ N(/j,', £') is given by the closed-form formula 



JfAp ■ 0) = g ( lo S jjfl + a// T £~V + (1 - a)/i' T S /_ V - fil^fi 

with 

£ a = (aS" 1 + (1 -a)S /_1 ) _1 , (50) 
/2 Q = E a t) a = £ (a£~V + (1 — a)T! 1 jUj (51) 
Letting A/i = /i' — /i, Eq. [49] can further be rewritten compactly as 

JfAp ■<!) = I (log tS^J ° + a (i _ a ) A/ZE" 1 A/xj . (52) 

Thus we obtain a closed-form formula for the Sharma-Mittal divergence of 
multivariate Gaussians generalizing the Renyi ct-divergences, formerly reported in [SJ: 



a 
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Figure 3. Plot of the Sharma-Mittal divergence D a ^ (Eq. [541 for univariate normal 
distributions with respective standard deviation a = y2 and a' = 2, and mean 
difference fJ. — fJ.' = 4. 



D QiP (N(^i:):N(ji!,^)) = 

1 / -^(lo g ^g^ + «(l-«)A^E-A,) 

0-1 I 



(53) 



|E 


a 


|E' 


1-a 









1 ' nsrisr-«v-- exp _^ A „ Tg - IA , _ , . , (54) 



Figure [3] shows the plot of the Sharma-Mittal divergence for univariate normal 
distributions with respective standard deviation a = v2 and a' = 2, and mean difference 
/j, — fj,' = 4 in Eq. [54j In practice, for numerical stability, we prefer to compute the 
divergence by first computing the Jensen difference divergence of Eq. [52J and then 
applying generic formula of Eq. |4"51 

The underlying distribution is usually not explicitly given so that we need to first 
estimate the distribution or related quantities like its entropy pU]. Leonenko et al. [H] 
proposed a method to estimate entropies using the fc-nearest neighbor graph (fc-NN) 
of an independently and identically distributed sample set X\,...,x n . However, their 
method suffers from the curse of dimensionality of computing fc-NN graphs and falls 
short when dealing with moderate dimensions. For exponential families, we can estimate 
the natural parameter of an exponential family using the maximum likelihood estimator 
(MLE) that admits the unique global optimum [U [5] 9 such that 

1 n 

VF(e) = -J2t(x l ). (55) 
n i=l 

For multivariate Gaussians, from the sufficient statistic t(x) = (x, x T x) we deduce 
that VF(9) = (Ji,t + fijs r ) = a ££=i x l} i Eti 
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It follows a simple and fast scheme to estimate the Sharma-Mittal entropy 
(or divergence) from n observations sampled identically and independently from an 
exponential family distribution: Estimate the natural parameter using Eq. [55] and apply 
formula of Eq. [T7] 

To conclude, let us note that any arbitrary smooth density can be approximated 
by an exponential family of order depending on the approximation precision [121 f!3] 
(enforcing no extra auxiliary carrier measure: That is, with k(x) = 0). Thus we 
can approximate the Sharma-Mittal entropy of an arbitrary probability density [33] 
by approximating it to a close exponential family, and then applying the closed-form 
formula Eq. [TTJ We believe that Eq. [T71 Eq. [22], Eq. H5] (numerically stable) and [Ml will 
prove useful when experimenting for suitable parameters (a, j3) in various statistical 
signal tasks [9]. 
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